DOCUKEHT RESUME 



ED 338 671 



TH 017 476 



TITLE 

IHSTITUTIOH 

SPOHS AGEHCY 
REPOHT HO 
PUB DATE 
HOTE 
PUB TYPE 



HcArthur, David h* 

Coinputerized Diagnostic Testingt problems and 
Possit>ilities* 

Center for Hesearch on Evaluation, Standards, and 

Student Testing, Los Angeles, CA* 

national Inst* of Education (ED), Washington, DC* 

CSE-S-255 

85 

22p* 

Reports - Evaluative/Feasibility (142) 



HDHS PaiCE 
DESCPIPXO&S 



IDEKTIFIEHS 



HFOl/PCOl Plus Postage* 

^Adaptive Testing; Classroom Hesearchf ^Computer 
Assisted Testing; ^Diagnostic Tests; Educational 
Diagnosis; Inferences; Heading Comprehension; Heading 
Tests; Test Const miction; ^Testing Problems 
Open Systems Theory; ^Uncertainty 



ABSTRACT 

The use of computers to build diagnostic inferences 
is explored in two contexts* In computerized monitoring of liquid 
oxygen systems for the space shuttle, diagnoses are exact because 
they can be derived within a world which is closed* In computerized 
classroom testing of reading comprehension, programs deliver a 
constrained form of adaptive testing and error performance stunmary* 
However, the world is open; diagnostic inferences cannot be made with 
precision, and additional practical factors play an important role in 
delimiting the usefulness of such a system* Problems of uncertainty f 
negation, and non-deterministic prediction are also discussed^ If 
test materials for computerized administration can be designed within 
tightly controlled parameters, and if the diagnostic strategy can be 
strongly tied to theory about performance errors within the topic 
domain, then many of the ambiguities of diagnostic inference will be 
closer to resolution* It is concluded that, given the number of 
limits both of a philosophical nature and in reference to actual 
testing practice, the role of computers will be supplementary to the 
educational diagnostic specialist* There is a I4*item list of 
references* (Author/SLD) 
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APPLIED STUDIES IN COMPUTERIZED DIAGNOSTIC TESTING: 
IMPLICATIONS FOR PRACTICE 

ABSTRACT 

The use of computers to build diagnostic Inferences Is explored In two 
contexts. In computerized monitoring of liquid oxygen systems for the 
space shuttle* diagnoses are exact because they can be derived wittiln a 
world which Is closed. In computerized classroom testing of reading 
comprehension* programs deliver a constrained form of adaptive testing ana 
error performance summary. However* the world Is open: diagnostic 
Inferences cannot be made with precision, and additional practical factors 
play an lir^ortant role In delimiting the usefulness of such a syston. 
Problems of uncertainty, negation* and nondeterminlstic pi'edlctlon are also 
discussed. 



ER.1C 



5 



Introduction 

Because of modern computer hardware and software^ an Intelligent 
system for diagnostic testing which Incorporates the advantages of 
computerized management with the latest theoretical developments in 
diagnostic test strategy Is no longer locked In the world of science 
fiction. In theory, a small computer could manage an Individualized 
adaptive testing session, drawing on a bank of diagnostlcally relevant test 
Items, making real-time decisions about competing diagnostic hypotheses 
based on the Incoming stream of responses. In theory, even If premised on 
a rough set of diagnostic Indicators, such a system ought to generate a 
functional summary of the performance of an examinee. Because the task of 
diagnosis In Its most elementary form Is simply one of identifying 
consls^nt patterns of examinee behavior^ It seems an Ideal task for the 
computer. 

In reality, of course, neither does the naive view of the diagnostic 
process portrayed above hold true for a moment, nor does blind application 
of high-technology computer prograiwilng circumvent an array of decisions 
about the nature of performance dnd Its context, the structure of 
performance testing, and a virtual guarantee of multiple uncertainties In 
Interpretation. Important problems arise In programming a 
pattern**d1agnost1c Inferencer to diagnose performance as It occurs. In 
operating that program, and In deriving meaningful diagnostics from Its 
outcomes. Finally, even In the best of circumstances. Improvements In the 
computablllty of diagnostic testing hinge on developments In computer 
software and diagnostic theory which have yet to occur. 



2 



Lest the reader feel that this viewpoint Is unduly pessimistic^ we 
note that computerized diagnostic testing H functioning at this moment In 
fl^ilds as diverse as reading comprehension and the launching of space 
vehicles* Their salient features and extensions to educational diagnostic 
testing using computers^ are the subject of this paper* Because of Its 
success^ the space shuttle diagnostic system can be used tc Illustrate 
critically Important conceptual underpinnings of the diagnostic process, 
which are generally lacking from diagnostic strategies In education and 
psychology* 

The earliest attempt at what would now be called computerized 
diagnostics was generated by a **teach1ng*learn1ng machine^ designed by 
Pressey (1926)* A rachet^rlven device^ not unlike a manual typewriter, 
presented selected test Items In a viewing window; responses were made on a 
specialized keyboard and scored mechanically. The process was envisioned 
as labor-saving, to "leave the teacher more free for her most Important 
work» for developing In her pupll^^ fine enthusiasms, clear thinking, and 
high Ideals'* (p*376). More recent work In computerized diagnostic testing 
In educational settings has been discussed In Bejar (1984), McArthur and 
Cabello (198S)» McArthur and Choppin (1984), Mitchell (1982)» and Schwartz 
(1934)* In reference to computerized psychological testing, Rold (198S) 
presents an extended overall analysis, though sketchy on the Issue of 
diagnosis* To summarize, all these writers agree that the potential of 
computers applied to the particular tasks of administering scoring and 
supplying the bases for test Interpretation looks genuinely good. Indeed^ 
a large amount of computer code to accomplish computer^managed testing Is 
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Included In Schwartz's book* and several . test publishers have 

begun marketing aggressively In this area* 

One reason for optimism Is the power of the latest generation of small 
computers* Computer hardware was formerly a major bottleneck In 
Implementing computerized diagnosis* Not long ago* few machines were 
capable of handling the job without severe restrictions on speed* memory* 
storage* and ancillary capabilities^ In less than a decade* computer 
technology has leapt forward In ways which now allow extraordinarily 
complex logical and mathematical operations to be Implemented very 
ripldly* Most restrictions that used to apply are gone* since reliable 
hardware can now Include not only keyboard and video display* but also 
voice synthesizer* voice recognition device* and real-time graphics* 
Highly veridical problem simulations are now possible* Alternatively* If 
the testing Is only a matter of presenting text to an examinee and waiting 
for a keystroke response* then modern lap-top computers suffice nicely* In 
sum* hardware no longer poses a significant barrier to the development of 
diagnostic tools* 

The task of computerized diagnosis Is much more demanding on computer 
software* Both logical and mathematical operations must work In an 
environment of real-time (respond now to this test Item) and periodic (save 
the examinee* s response pattern In long-term memory) operations* 
Fortunately* a number of programming languages are equipped to handle these 
composite requirements* An Important software problem which Is more 
difficult .to solve Is the handling of exception conditions. Exceptions 
occur when the program encounters some action or data which It Is not 
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prepared to handle. While some languages return a n11» a default^ or an 
explicit *don*t know/ others respond to that event with total program 
failure. When exception handling is added to the requirements noted above^ 
no single programming language emerges as the perfect software vehicle for 
computerized testing. Ideally^ a real-time oriented language for 
programming of computerized diagnosis would include extensive facilities 
for error-trapping as well ^s both symbolic manipulation and arithmetic 
computation. Even advanced languages like Kodula-2» C, and LISP» and CAI 
production systems like PILOT^ incorporate only some solutions to these 
issijes^ so the final choice awaits further developments in software 
technology. 

A Closed-World Diagnostic Inferencer 
The space shuttle launch monitoring system described by Scarl, 
Oamieson and Delaune (1985) serves as an excellent model for computerized 
diapostlcs^ on the one hand because it is highly effective and on the 
other because the world in which it works is well formulated. Space 
shuttles are launched under extraordinarily tight controls, with thousands 
of critical indicators being monitored and evaluated continuously by 
computer. Recently, the monitoring of liquid oxygen activity (valves, 
pipes, tank£. Flow rates^ pressures and the like) has been accomplished by 
a computerized expert diagnostic system, operating as an intelligent 
watchdog, with the capability oF quickly isolating and interpreting any 
error anywhere within its purview. Its diagnostic strategy is strongly 
predicated on the notion that a truly simultaneous occurence of independent 
errors is highly unlikely. Far more probable is a failure in some 
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component which has consequences felt immediately or soon thereafter in 
several places further along the chain. 

What happens when the liquid oxygen expert **discovers** that one or 
several sensors are ^-^porting values out of normal range is what makes it 
an excellent expert to study from the point of view of diagnosis. In the 
simplest case» the system receives indication of a single-point 
error: a single sensor registers abnormally high or lew. The expert 
"knows** enough to assess the degree of criticality of that component^ and 
to ^'understand" whether failure at that point in the complex array of 
liquid oxygen circuitry should have consequences felt downstream by other 
sensors. If all downstream indicators are reporting clear the rwst likely 
explanation of this single erroneous indication is sensor falluref 

If» on the other hand» a cluster of errors is suddenly reported 
together^ the expert evaluates the root cause of such multi**point failure 
In two ways. The first is a method of set Intersections^ using assumptions 
about the state in which matters would have to be in order to produce 
those values being received at this time. The second is a method based on 
simultaneous hypothesis testing^ using the logic of a propagating error 
tree which is tested in increasing depth until a point source of the error 
is isoUted. 

The liquid oxygen diagnostic system operates in a closed world. Its 
sensors cover the entire domain^ and errors within that domain are 
registered unambiguously. As long as the system*s programmers have 
properly placed each sensor and have accounted for any unique operating 
characteristics or "quirks/ no error of any consequence whatsoever will go 
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undetected. For such a system, the world beyond Us sensors need not be 
considered because every plausible diagnostic possibility has already been 
inci uded. 

Assumptions for the Closed World 
The requirement for a closed-world diagnostic system Is a test domain 
In which all possible faults may be enumerated discretely, and In which 
each single source of data may be pegged. In advance, as to Its range of 
reporting values. The Introduction of a single fault not contained In the 
list of known faults, or of a single datum of unknown character, exceeds 
the closed world and destroys Its advantage. A key part of the advantage 
of a completely closed world Is that all of the operating characteristics 
of that world can be known exactly. They need not i>e explicated in 
entirely, but by virtue of their availability, the closed-world 
inferencer has the resources to evaluate any plausible permutation of 
events. 

Suppose we are Interested In diagnosing faults in a contained domain 
like the liquid oxygen system with the constraint that we do not yet know 
exact tolerances for many of the sensors. We could proceed by discarding 
the evidence shown by those sensors altogether and instead use only those 
pieces of evidence about which we have advance knowledge as to its shape. 
We could allow the shuttle to be launched under a series of controlled 
trials regardless of the data until we amass a repertoire of Interrated 
cause-and-effect relattonships between sensor reports and final outcome, 
using that experience to build a library of allowable values. We could 
attempt to corroborate the multiple data from sensors from the present 
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shuttle with salient aspects of previous experience^ then use such 
experience as a selective and conditional guide to completing the present 
task. Multiple avenues could be productively explored given enough 
resources^ such that a suitable diagnostic evaluation eventually could be 
made of liquid oxygen system activity. Obviously, however, the operational 
advantage lies with the system which need not engage In strictly 
exploratory behaviors before being able to form diagnostics conclusions. 

Two additional considerations must also be made: the first concerns 
the nonlntermlttency of signals while the second concerns the granularity 
of the data received. Nonlntermlttency Is a strong assumption within the 
liquid oxygen diagnostic system. While each sensor Is capable of 
generating a continuous stream of data, sensing of the status of any given 
sensor occurs at discrete Intervals. Any senf^ed value Is expected to be 
regular. That Is, stable readings are seen as far more likely, from the 
point of view of diagnostic Interpretation, than are wild fluctuations 
within short Intervals. Indeed, Intermittent fluctuations are more readily 
Interpretable as sensor errors and noise than as diagnostlcally relevant 
Indicators. 

The second additional consideration Is that the data received from the 
various senscrs are at the functional granularity demanded the 
diagnostic Inference process. This means, on one hand, that no aggregation 
of Incoming values need be made prior to t:.i>rC! them diagnostlcally, and, on 
the otJier, that no step In the diagnostic process will require finer shades 
of data than are being delivered. The granularity of data. In this 
Instance, Is optimal. 
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Diagnostic Inference in an Open World 
While there are several approaches which have been taken to building 
computerized diagnostic testing in the domain of reading comprehension^ 
few of the assuir^tions used in closed world diagnostics carry over. 
Reading comprehension is a domain for which few theorists^ if any» have 
attempted to formalize all of the likely diagnostic indicators of erroneous 
performance. The diagnostic inference process in reading comprehension* 
even as practiced by professionals, represents wore than sinple rules of 
procedure and logical chaining of consequences. Characteristically the 
process reflects an accumulation of overlapping evidence, plus both common 
sense and, for lack of a better descriptor, professional acumen. Neither 
of the latter factors are especially amenable to computeriaation. 
Nonetheless, computer programs now exist which are capable of adaptively 
presenting a limited scope of reading test items and deriving from the 
response pattern a composite error summary. 

In a domain such as reading comprehension, the scope of a student's 
misunderstanding can be quite large, so large as to make it exceedingly 
difficult to predict all possible errors. The likelihood of a si'^^le-point 
error is slim, since so few errors in reading are unitary. Most often an 
examinee will demonstrate multiple errors, yet isolation of a single cause 
of a multi-point error cannot use a system of tracing error propogation 
because no theory of reading yet includes one. It is rather unlikely that 
the data from such testing can be construed as always nonintermittent, and 
even more unlikely that the raw responses are at an optimal level of 
granularity. For computerized diagnostic testing of reading comprehension 
th<s benefits of closed world assumptions do not hold. The relatively 
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simple rules which allow noisy data to be cast out cannot be applied* Even 
In the most carefully prepared of the present systems for appraising 
reading comprehension skills^ what constitutes a diagnostlcally useful 
pattern of erroneous responses Is not completely resolved* 

Our experience with a dedicated test administration/ feedback system 
and a test of reading comprehension skills specifically designed around 
diagnostic principles demonstrates some positive outcomes despite the 
concerns portrayed above. One hundred and sixteen upper primary pupils 
were given a pair of brief computer*managed tests^ one a test of pronoun 
usage and the other a reading test Involving short essays* Items were 
calibrated by difficulty and Item distractors were keyed as to the ^pe of 
error each reflected* Movement from Items of moderate difficult to Items 
of greater or lesser difficulty Mere controlled by real-time appraisal of 
examinee performance. This movement up or down was significantly related 
to examinee grade level and reading skill* An «2xamlnee*s movement up or 
down In one test was generally corroborated the same movement up or down 
In the other test* The pronoun test, which represents one of the more 
closed worlds In reading skills, showed a fair degree of performance 
consistency within examinees, and a balanced and logical distribution of 
error types by skill level across examinees* The comprehension test, 
representing a more open-world domain, showed a somewhat less logical error 
pattern overall: students who evidently had the capacl^ to properly 
answer Items at a middle range of Item difficulty frequently stumbled on 
simple errors of literal comprehension when answering more difficult 
Items* 
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Practical Requirements 

Our testing with a prototype diagnostic inferencer in classroom 
settings suggests that a number of the troubles noted in reference to 
diagnostic strategy in the open world can be favorably resolved* Heavy 
erihasis must be given to designing a test which adequately covers the full 
scope of a given topic^ and does so with items which possess good 
psychometric qualities and well-formed error categories* Based on 
technical considerations^ It should be pointed out that there are other 
requirements that dictate the nature of a test which would be suitable for 
computer*tnanaged diagnostic testing in education or psychology* 

The first requirement is that the test match the operations of the 
computer; its text must fit within an available screen window^ its tasks 
(type your name* hit this response key) must be unambiguous to the user^ 
its options (strike this key to go forward or that key to go backward) must 
be exceedingly clear* The default instructions for taking a 
paper^nd*pencil test^ so thoroughly ingrained in most students by habit 
alone^ are not automatically transferred by students to computer testing* 
For students who falter as they go to type their name at the keyboard 
before the test begins* frustration alreadly mounts* 

Second* the test as a whole must be user*safe* Ttiat is* the software 
must be **fire*walled/* At the opening requirement that the student type 
their name at the keyboard* there are dozens of possible variations which 
must be distinguished by the computer from an incomplete or erroneous 
attempt; the software must only move forward when the student is reacjiy* No 
matter what logical or illogical key sequence is pressed* the software must 



ERLC 



15 



be able to separate a legitimate response from careless keystrokes or 
lightly malicious attempts to **fool the system,** Few students will try the 
latter^ but nonetheless a computerized test which succumbs to one pup11*s 
errant behavior will stand little chance of surviving the rwnalnder of the 
allotted time period^ simply because students 'find killing the system a 
great deal more Interesting than completing the test. 

Thirds the test nnist be Intrinsically Interesting^ separate from the 
novelty of Its appearance on a videoscreen. Because the experience and 
excitement of videogames Is almost universal among school children even at 
the lowest grades^ expectations of what a computer will do are now 
jaundiced. Repeated presentation of chunks of text on screen^ with a 
single keystroke as the sole behavior required of the student^ Is deadly. 
Some students may find themselves striking a response key simply to make 
the screen do something — anything; In that Instance, all of the ordinary 
concerns about random and partially random responding to test Items are 
exacerbated. The results of computerized testing and any ensuing 
diagnostic Interpretations become uncertain at best. 

Models for Handling Uncertainty 

A diagnostic Inferencer which works In anything but a cornpletely 
closed-world environment runs headlong Into Issues of uncertainty. The 
variety of ways In which uncertainty can be handled statistically and 
probabilistically suggests that no single solution suffices. Pearl (1984) 
describes detailed logical and mathematical approaches to the task using an 
approach which stems from a Bayeslan tradition. Prade (1985) models 
Imprecision and uncertainty with a deductive system modeled on fuzzy-set 
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logic. Regg1a» Perrlcone* Nau» and Peng (1985) delineate an abductlve 
Inference process In which plausible causal associations are derived 
sequentially by testing symbolic conditional probabilities through set 
theory. 

In the context of diagnostic testing, the ways In which uncertainty Is 
managed are crucial to the diagnostic outcome. An example of this Is 
negation, deciding when a particular diagnostic hypothesis Is no longer 
viable. Cohen*s (1984) multiple definitions of rule endorsement and 
negation form a case In point. The weakest Interpretation of negation 
(called ^ortrlch^) allows a hypothesis to be negated If positive evidence 
In support of the rjpothesis does not currently appear In the data. The 
strongest Interpretation of negation (called *'liard-not)** requires hard 
evidence In support of the negation of a hypothesis or In support of the 
hypothesis* opposite be present In the data. A closed-world assumption 
requires that evidence for negation of the (hypothesis be present or that a 
proof offered In support of the hypothesis falls. 

Almost all of the operations undertaken In a diagnostic test In 
education are subject to negation at one point or another. 
The task of diagnostic testing can be Interpreted as an exploration of 
competing hypotheses and a weeding out of those hypotheses which are not 
receiving support. Because of the uncertainties Inherent In test 
responses, the rentoval of a plausible hypothesis frc»n the set of hypotheses 
under study Is seldom matched strong evidence. Most frequently, one has 
to make use of weak negative Information to place that hypothesis on the 
back burner, then rely on good luck to Isolate diagnostlcally Important 
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Information from the hypotheses wnlch renwtlni The problem Is one of 
logistics as well as mathematics: how can one optimize the selection of 
avenues to explore without predicting that zom pat;hs are likely to be less 
fruitful than others? In diagnostic testing In an open-world env1ronn)ent» 
these predictions are very difficult to make* 

In a parallel donwtln, a recent contribution to the field of artificial 
Intelligence sets out a series of transformation methodologies to 
deteriiilne sequence-generating rules (Oletterlch & Michalski, 1985)* The 
-problem^ a direct analogue of real-time diagnostic testing* Is one of 
predicting future behavior by looi*1ng back. In varying degrees of depth, at 
the evidence so far ~ that Is, estimating what constitutes a meaningful 
summary pattern and expectation for the next behavior In sequence based on 
some or all of what has gone before* The simplest version of this problem 
occurs when the next behavior (or object, or response) Is one which Is 
totally predetermined by every attribute associated with all past objects. 
All of the attributes In the preceding string of evidence can be used to 
form a perfect prediction of what will come next; the methodological 
problem reduces to counting the total number of distinct attributes* 

In a nondetermlnlstic prediction problem, the occurence of the next 
piece of evidence may or niay not entirely fit the string of evidence 
collected to date* Certain subsets of attributes may play more significant 
roles In determining the next evidence than others* The goal Is to find 
plausible and parsimonious descriptors of key patterns underlying the 
evidence* This Is a close equivalent to the kind of diagnostic process 
seen In the non-closed-world systems discussed earlier* The methodological 
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problems are significant* for the solution requires understanding which 
attributes of the evidence collected to date actually contribute 
meaningfully to behaviors* and which attributes are misleading* Irrelevant* 
and/or simply reflect random values, 

Oletterlch and Michalski (1985) suqgest that the solution to 
non^deternilnlstlc prediction might rely on any of three approaches to 
var1able*valt:ed logic calculus: dlsjunctlve-normal modeling* decomposition 
modeling* or periodic modeling.* When objects or events In the opening 
sequence make use of a small set of finite-valued attributes* experimental 
evidence suggests that objects or events later In the sequence can be 
predicted well by any of the three models* looking backward to various 
depths at the Preceding evidence. Given a stream of data which defies 
categorization* all of these models will either labor for extended periods 
of time without producing useful results* or **d1scover** one rule or another 
which fits the data badly. Unfortunately* current Implementations of all 
three methods suffer from the weakness that they do not attempt to evaluate 



*A disjunctive-normal model builds upon "the fewest number of conjunctive 
ternts that covers all of the positive examples and none of the negative 
examples** (p. 219). An Iterative process generates an Increasing number of 
maximally-general expressions until no positive example remains which Is 
not already covered. The decomposition model Iterates trial versions of 
generalizations of attributes among pieces of evidence. Its Intermediate 
results are then tested against the negative evidence found In the data* 
and It concludes only when the decomposition succeeds In excluding all 
negative evidence. The periodic model expands on this latter approach* 
testing conjuncts of positive evidencii and comparing the degrees of overlap 
between attribute selectors until some functional minimum of overlap Is 
reached* and at the same titre no negative evidence remains Included by the 
hypotheses. 
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the best solutions f1rst» they have no way to assess the plausibility of 
their results^ and th^ cannot at present form composite models* 

Summary 

Computerization of the diagnostic testing process In education Is a 
challenge of multlole dimensions^ Including both operational aspects and 
philosophical underpinnings* Indeed^ the question has been raised as to 
whether computer software can adequately represent the subtle but Important 
coiranon^sense elements which come Into focus when the domain of Interest Is 
not within a closed world (Bobrow & Hayes» 1985)* As the preceding 
analysis has shown» the closed world provides a substantially cleaner 
environment within which to perform diagnostic Inference* In tte case of 
educational diagnosis^ most domains tend to be relatively open-ended and 
thus no comparable clarity can be found* 

If the test materials for computerized administration can be designed 
within tightly controlled parameters^ and If the diagnostic strategy can be 
strong'iy tied to theory about performance errors within the topic domain, 
then many of the ambiguities of diagyiostlc Inference will be closer to 
resolution* The algorithm that Is used to select the next Item In sequence 
Is also critical: along with Item calibrations, a selection algorithm 
could use diagnostlcally probative Items, Items which are particularly 
suited to explore the examinee's misunderstanding of a given concept within 
the text* Ideally, too, the pattern of erroneous performance of an 
Individual respondent. Instant by Instant, could be analyzed In the context 
of similar patterns generated In previous testing sessions* 
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Yet to be solved Is the problem of diagnostic precision* Inherent in 
most educational topics are unstudied assumptions about the ways in which 
erroneous performance manifests Itself. Psychometric evidence makes clear 
that patterns of errors In multiple-choice test Items occur In complex ICC 
distributions which are neither directly Interpretable from theory, nor 
completely orthogonal to other traits of the test or the respondent. Thus» 
at present* even with the best of con^iuterlzed testing* the veracity of 
diagnostic outcomes from computerized testing must be closely scrutinized. 

The computer has proved itself valuable In managing more traditional 
varieties of educational test administration and scoring. Properly 
programmed* the computer can become an unparalleled asset In the context of 
diagnostic testing* If certain limits are observed. Taken collectively* 
the sheer number of limits both of a philosophical nature and In reference 
to actual testing practice strongly suggest that the computer's role will 
be supplementary to the educational diagnostic specialist. Breakthroughs* 
however* could occur as soon as coinputer software moves Into Its next 
generation of power^ and as soon as educational theorists are )ble to build 
detailed models of misunderstanding. 
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